Focus Replay Debugging Effort on the Control Plane
نویسندگان
چکیده
Replay debugging systems enable the reproduction and debugging of non-deterministic failures in production application runs. However, no existing replay system is suitable for datacenter applications like Cassandra, Hadoop, and Hypertable. On these large scale, distributed, and data intensive programs, existing replay methods either incur excessive production recording overheads or are unable to provide high fidelity replay. In this position paper, we hypothesize and empirically verify that control plane determinism is the key to recordefficient and high-fidelity replay of datacenter applications. The key idea behind control plane determinism is that debugging does not always require a precise replica of the original application run. Instead, it often suffices to produce some run that exhibits the original behavior of the control-plane–the application code responsible for controlling and managing data flow through a datacenter system.
منابع مشابه
An Empirical Study of the Control and Data Planes (or Control Plane Determinism is Key for Replay Debugging Datacenter Applications)
Replay debugging systems enable the reproduction and debugging of non-deterministic failures in production application runs. However, no existing replay system is suitable for datacenter applications like Cassandra, Hadoop, and Hypertable. For these large scale, distributed, and data intensive programs, existing methods either incur excessive production overheads or don’t scale to multi-node, t...
متن کاملReplay Debugging for the Datacenter
Replay Debugging for the Datacenter by Gautam Deepak Altekar Doctor of Philosophy in Computer Science University of California, Berkeley Professor Ion Stoica, Chair Debugging large-scale, data-intensive, distributed applications running in a datacenter (“datacenter applications”) is complex and time-consuming. The key obstacle is non-deterministic failures—hard-to-reproduce program misbehaviors...
متن کاملDEFINED: Deterministic Execution for Interactive Control-Plane Debugging
Large-scale networks are among the most complex software infrastructures in existence. Unfortunately, the extreme complexity of their basis, the control-plane software, leads to a rich variety of nondeterministic failure modes and anomalies. Research on debugging modern control-plane software has focused on designing comprehensive record and replay systems, but the large volumes of recordings o...
متن کاملDCR: Replay-Debugging for the Datacenter
We’ve built a tool for debugging non-deterministic failures in production datacenter applications. Our system, called DCR, is the first to efficiently record and replay large scale, distributed, and data-intensive systems such as HDFS/GFS, HBase/Bigtable, and Hadoop/MapReduce. The enabling idea behind DCR is that debugging doesn’t require a precise replica of the original datacenter run. Instea...
متن کاملReplay Debugging of Complex Real-Time Systems: Experiences from Two Industrial Case Studies
Deterministic replay is a method for allowing complex multitasking real-time systems to be debugged using standard interactive debuggers. Even though several replay techniques have been proposed for parallel, multi-tasking and real-time systems, the solutions have so far lingered on a prototype academic level, with very little results to show from actual stateof-the-practice commercial applicat...
متن کامل